-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[ENH] Add local support for schema #5714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
26fbb95 to
50dc87a
Compare
50dc87a to
4f90e30
Compare
d88801a to
a1642ad
Compare
4f90e30 to
4170cd0
Compare
a1642ad to
d9975c0
Compare
cc49da4 to
82c400d
Compare
d9975c0 to
73d6c8c
Compare
82c400d to
5a6e468
Compare
73d6c8c to
f60a76e
Compare
5a6e468 to
8e145e7
Compare
45ca931 to
3e6652e
Compare
cb029e5 to
ebae17b
Compare
ebae17b to
9bc109b
Compare
3e6652e to
bdf764b
Compare
bdf764b to
2def8f3
Compare
9bc109b to
d22f13f
Compare
2def8f3 to
1695d4a
Compare
| let schema = match first_row.get::<Option<&str>, _>(7) { | ||
| Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Error handling inconsistency: The schema deserialization logic handles empty strings and "null" values differently across the codebase. In the SQLite implementation, it checks for both conditions, but other parts of the code may not handle these edge cases consistently.
match first_row.get::<Option<&str>, _>(7) {
Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => {
// This logic should be centralized in a helper function
// to ensure consistency across all schema deserialization points
}
// ...
}Consider creating a centralized deserialize_schema_from_db helper function to ensure consistent handling of these edge cases.
Context for Agents
[**BestPractice**]
Error handling inconsistency: The schema deserialization logic handles empty strings and "null" values differently across the codebase. In the SQLite implementation, it checks for both conditions, but other parts of the code may not handle these edge cases consistently.
```rust
match first_row.get::<Option<&str>, _>(7) {
Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => {
// This logic should be centralized in a helper function
// to ensure consistency across all schema deserialization points
}
// ...
}
```
Consider creating a centralized `deserialize_schema_from_db` helper function to ensure consistent handling of these edge cases.
File: rust/sysdb/src/sqlite.rs
Line: 84417e078b to
32d4b95
Compare
rust/sqlite/src/db.rs
Outdated
| // Check if database has more applied migrations than available source migrations | ||
| if applied_migrations.len() > source_migrations.len() { | ||
| return Ok(vec![]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Potential null database migration issue: The code checks if applied_migrations.len() > source_migrations.len() and returns empty migrations, but this could mask real migration problems. If the database has more applied migrations than source migrations, this suggests a version mismatch or corrupted migration state that should be explicitly handled.
Suggested Change
| // Check if database has more applied migrations than available source migrations | |
| if applied_migrations.len() > source_migrations.len() { | |
| return Ok(vec![]); | |
| // Check if database has more applied migrations than available source migrations | |
| if applied_migrations.len() > source_migrations.len() { | |
| return Err(SqliteError::MigrationVersionMismatch( | |
| format!( | |
| "Database has {} applied migrations but only {} source migrations available", | |
| applied_migrations.len(), | |
| source_migrations.len() | |
| ) | |
| )); | |
| } |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
Potential null database migration issue: The code checks `if applied_migrations.len() > source_migrations.len()` and returns empty migrations, but this could mask real migration problems. If the database has more applied migrations than source migrations, this suggests a version mismatch or corrupted migration state that should be explicitly handled.
<details>
<summary>Suggested Change</summary>
```suggestion
// Check if database has more applied migrations than available source migrations
if applied_migrations.len() > source_migrations.len() {
return Err(SqliteError::MigrationVersionMismatch(
format!(
"Database has {} applied migrations but only {} source migrations available",
applied_migrations.len(),
source_migrations.len()
)
));
}
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
</details>
File: rust/sqlite/src/db.rs
Line: 1598ec4278 to
bee55b4
Compare
| .get_hnsw_config_with_legacy_fallback(segment)? | ||
| .schema | ||
| .as_ref() | ||
| .map(|schema| schema.get_internal_hnsw_config_with_legacy_fallback(segment)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanketkedia is it fine to not reconcile on the writer? i believe it should come through frontend, so it should reconcile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, I think we need this here. The reconcile of schema with config happens in the handler for BackfillMessage. And then we need to reconcile this with legacy metadata here so this seems correct and necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I'd have liked all the three reconciles to happen at one place but that's not what it is now even with collection config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant reconcile schema and config. yes the legacy fallback is needed everywhere
jairad26
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use validate_schema from validators.rs in bindings.rs
|
nvm |
bee55b4 to
296af1b
Compare
296af1b to
dd3f346
Compare
| // This is for backwards compatibility so that users who migrate to distributed | ||
| // from local don't break their code. | ||
| KnnIndex::Spann => { | ||
| let internal_config = if let Some(space) = hnsw.space { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you reconcile with legacy metadata here before getting the space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I feel like we should remove the blanket reconcile at the top and reconcile here in various places for readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
dd3f346 to
df8c8c8
Compare
|
In general, I feel like the reconciliation logic is all over the place. But that's from before (collection config) so ok for now. But ideally once you've read from sysdb, you should assume that it has schema set and properly reconciled with both collection config and legacy metadata and just use it downstream |
## Description of changes _Summarize the changes made by this PR._ - Improvements & Bug fixes - This PR adds support for schema in sqlite sysdb, correctly reconciling with schema, legacy metadata, and supporting configuration updates. It also adds support for passing schema via bindings, to allow for local chroma support. It also updates cli usage of to allow copying of schema - New functionality - ... ## Test plan _How are these changes tested?_ expanded schema e2e tests to ensure bindings and single node all work as intended - [ x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Migration plan _Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?_ ## Observability plan _What is the plan to instrument and monitor this change?_ ## Documentation Changes _Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_

Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
expanded schema e2e tests to ensure bindings and single node all work as intended
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?